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DETAILED ACTION 

This final office action is prepared in response to the applicant's amendments and 
arguments filed on September 8, 2009 as a reply to the non-final office action mailed on July 6, 
2009. 

No claim has been cancelled or amended; 
Claims 17-52 are now pending; 

Response to Arguments 

Applicant's arguments filed on September 8, 2009 have been carefully considered but 
deemed unpersuasive of Examiner's response to the arguments below. 

Accordingly, THIS ACTION IS MADE FINAL. See MPEP 706.07(a). Applicant is 
reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 

1 . Regarding the rejection of claim 1, applicant argued that "Yamamoto did not teach or 
suggest that the source tape, itself, is ran faster than normal speed" (Remarks, page 11, para. 1). 

In response, Examiner would like to point out that the claim has no mentioned of "source 
tape", let alone run the source tape faster than normal speed, rendering Applicant's argument 
moot. 

Further regarding the rejection of claim 1, applicant argued that in Yamamoto, "the 
data recorded on disc recorder 13 is on a compressed time axis and not at normal speaking 
speed." 
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In response, Examiner would like to point out that in Yamamoto, the disc recorder 13" 
records data on discs 24a, 24b, . . . 24k on the time axis compressed basis. The compressed time 
axis serves to reduce the overall time for the fabrication of slave tapes (Yamamoto, col. 2, lines 
23-30). Once the data is duplicated and recorded on discs 24a, . . . 24k, they must be played out 
at normal speaking speed in order for the data to be comprehensible. 

Furthermore, Examiner would like to clarify the meaning of the following statement in 
the office action. 

On page 4 of the office action mailed on July 6, 2009, Examiner stated that 
"Russell did not explicitly disclose that the analog audio signal is played at an increased 
speed, and that the audio information data and the signal pause duration data of the resulted 
digital audio signal represent outputs at a normal speaking speed." 

What Examiner meant to convey through this statement was that Russell did not 
explicitly disclose the particular combination of said claim limitations. Examiner's statement 
does not mean that Russell does not explicitly disclose one of the said claim limitations. Russell 
indeed disclose that "the signal pause duration data of the resulted digital audio signal represent 
outputs at a normal speaking speed" through its disclosure in col. 7, lines 4-7 and col. 7, lines 13- 
15. In particular, col. 7, lines 13-15 disclosed "When playing the recalled speech, the present 
invention may optionally skip the identified speech pauses and non-speech utterances," which 
implies that the pauses and non-speech utterances represent output at a normal speaking speed. 
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Examiner's position is that Russell did not explicitly disclose that "the analog audio 
signal is played at an increased speed." 

As the claim did not disclose why the analog audio signal is played at an increased speed, 
Examiner's understanding is that the increased speed reduces the time required for loading the 
audio signal into a computer for processing. Therefore, Examiner relies on Yamamoto to teach 
this limitation because Yamamoto clearly disclosed in col. 1, lines 19-22 and lines 38-45 that to 
fast, mass reproduce music or language tapes, the method well known in the prior art is to drive 
the source tape at a high speed (such as 32 times the normal speed). 

It would have been obvious for one skilled in the art to apply Yamamoto's teaching to 
Russell such that pre-recorded audio signal is loaded into the processing device 43 in Fig. 3 at 
reduced time, cutting down the delay and giving a user of the system better user experience. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

2. Claims 17-18, 20, 26-27, 32-33, 35, 41-42, 46 and 50 are rejected under 35 
U.S.C. 103(a) as being anticipated by Russell et al. (U.S. patent No. 5,526,407, hereinafter 
"Russell"), in view of Yamamoto et al. (U.S. patent No.4,355,338, hereinafter "Yamamoto"). 
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Regarding claim 17, Russell disclosed a method for digitally recording an analog audio 
signal, the method comprising: 

(a) receiving an analog audio signal (Russell, Fig. 2 and col. 9, line 30 disclosed "analog 
front end", which records analog signal) containing audio information and signal pauses 
information (a speech signal by nature contains speech information (i.e., audio) and non-speech 
information (i.e. signal pauses), as evident in Russell, col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio signal comprising audio 
information data and signal pause duration data (Russell, Fig. 2 and col. 9, lines 31-33); 

(c) storing the audio information data of the digital audio signal as information data 
blocks and the signal pause duration data of the digital audio signal as signal pause data blocks 
having different time durations in a memory (Russell, col. 6, lines 43-53 disclosed storing the 
speech stream and structure which represents categorized portions of the speech stream; Russell, 
col. 15 lines 8-11 further disclosed that the characterization of speech includes the phrase time 
duration and the presence of pauses), 

wherein the audio information data and the signal pause duration data of the resulted 
digital audio signal represent outputs at a normal speaking speed (Russell, col. 7, lines 4-7 and 
col. 7, lines 13-15. In particular, col. 7, lines 13-15 disclosed "When playing the recalled speech, 
the present invention may optionally skip the identified speech pauses and non-speech 
utterances," which implies that the pauses and non-speech utterances represent output at a 
normal speaking speed); 

(d) generating a plurality of audio information data sequences by sequentially reading the 
information data blocks and the signal pause data blocks (Russell, col. 14, lines 43-50 disclosed 
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that a speech process program 136 separate the speech into phrases demarcated by perceptible 
pauses; col. 17, line 67 disclosed "RIFF chunk" as an audio information data sequence), the 
audio information data sequences being separated by the signal pause data blocks if an assigned 
time duration of the signal pause data block is higher than a predetermined time duration (col. 
17, line 64 disclosed that the silence of a specified second duration marks a signal pause). 

Russell did not explicitly disclose receiving an analog audio signal played at an increased 

speed. 

However, Yamamoto disclosed a method for fast reproduction of music or language 
tapes, where it is know in the prior art that to fast, mass reproduce music or language tapes, the 
source tape can be driven at a high speed (such as 32 times the normal speed) (see Yamamoto, 
col. 1, lines 19-22 and lines 38-45). 

One of ordinary skill in the art would have been motivated to combine Russell and 
Yamamoto because both disclosed system and method for converting an analog signal to a 
digital signal using analog-to-digital converter (Russell, Fig. 2, "Analog Front-end"; Yamamoto, 
Fig. 2, "A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply Yamamoto's 
teaching to Russell such that pre-recorded audio signal is loaded into the processing device 43 in 
Fig. 3 at reduced time, allowing the speech processing modules (Fig. 4, "voice print info", 
"speech extraction", "codec" codec) sufficient time to process the speech and produce output 
without introducing a noticeable delay to the user. The combination yields the highly desirable 
result of providing better user experience. 
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Regarding claim 18, the combination of Russell and Yamamoto disclosed the method of 
claim 17. 

Russell further disclosed producing an index table by sequentially reading the 
information data blocks and the signal pause data blocks (Russell, col. 14, lines 43-67 and col. 
11, lines 1-17). 

Regarding claim 20, the combination of Russell and Yamamoto disclosed the method of 
claim 18. 

Russell further disclosed wherein producing the index table comprises processing the 
sequentially read data blocks (col. 14, lines 43-67 and col. 11, lines 1-17). 

Regarding claim 32, Russell disclosed a method for digitally recording an analog audio 
signal with automatic indexing, the method comprising: 

(a) receiving an analog audio signal containing audio information and signal (Fig. 2 and 
col. 9, line 30 disclosed "analog front end", which records analog signal; a speech signal by 
nature contains speech information (i.e., audio) and non-speech information (i.e. signal pauses), 
as evident in the disclosure in col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio data comprising audio 
information data and signal pause duration data (Fig. 2 and column 9, lines 31-33); 

(c) storing the converted digital audio data (col. 6, lines 44-45, "stores the speech stream 
in at least a temporary storage"); 
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(d) reading the stored digital audio data sequentially (col. 14, lines 48-49 disclosed that 
the speech process program 136 allocates buffers to receive the real-time speech and separate the 
speech into phrases, which implies that the speech process program 136 must read the real-time 
speech sequentially); 

(e) decoding whether the digital audio data are audio information data or signal pause 
duration data (col. 15, lines 46-53 and col. 17, lines 63-65); 

(f) storing the audio information data as information data blocks and the signal 
pause duration data as signal pause data blocks in a memory (column 6, lines 43-53 disclosed 
storing the speech stream and structure which represents categorized portions of the speech 
stream; col. 15 lines 8-11 further disclosed that the characterization of speech includes the phrase 
time duration and the presence of pauses); and 

(g) reading the stored data blocks sequentially in order to produce a data structure for 
managing the indexing (col. 14, lines 43-50 disclosed that a speech process program 136 separate 
the speech into phrases demarcated by perceptible pauses; col. 17, line 67 disclosed "RIFF 
chunk" as an audio information data sequence), 

wherein a succession of information data blocks which is not interrupted by a signal 
pause with a pre-determined duration being detected as an audio information data sequence 
whose start and end are stored in the data structure for managing the indexing (col. 17, lines 63- 
64 disclosed that sound of at least a certain first threshold duration followed by silence of a 
specified second duration indicates that a phrase has been completed; and the time the phrases 
began and ended are recorded). 
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Russell did not explicitly disclose receiving the analog audio signal played at an 
increased speed. 

However, Yamamoto disclosed a method for fast reproduction of music or language 
tapes, where it is know in the prior art that to fast, mass reproduce music or language tapes, the 
source tape can be driven at a high speed (such as 32 times the normal speed) (see Yamamoto, 
col. 1, lines 19-22 and lines 38-45). 

One of ordinary skill in the art would have been motivated to combine Russell and 
Yamamoto because both disclosed system and method for converting an analog signal to a 
digital signal using analog-to-digital converter (Russell, Fig. 2, "Analog Front-end"; Yamamoto, 
Fig. 2, "A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply Yamamoto's 
teaching to Russell such that pre-recorded audio signal is loaded into the processing device 43 in 
Fig. 3 at reduced time, allowing the speech processing modules (Fig. 4, "voice print info", 
"speech extraction", "codec" codec) sufficient time to process the speech and produce output 
without introducing a noticeable delay to the user. The combination yields the highly desirable 
result of providing better user experience. 

Claim 47 is rejected under the same rationale as claim 32 as it lists elements that are all 
listed in claim 32 and disclosed by Russell. 

Claim 50 is rejected under the same rationale as claim 32 as it lists elements that are all 
listed in claim 32 and disclosed by Russell. 



Application/Control Number: 1 0/03 1 ,47 1 
Art Unit: 2444 



Page 10 



Regarding claims 26 and 41, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell further disclosed wherein the digital audio data are compressed before storage 
(col. 9, lines 39-43, "codec" and col. 14, lines 66, "to compress the speech"). 

Regarding claims 27 and 42, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell further disclosed wherein each information data block contains an information 
data block identifier and audio information data, and each signal pause data block contains a 
signal pause data block identifier and signal pause duration data (Russell, col. 11, lines 1-27). 

Regarding claim 33, the combination of Russell and Yamamoto disclosed the method of 
claim 32. 

Russell further disclosed wherein the data structure produced for managing the indexing 
is an index table (Fig. 5 and col. 11, lines 1-27). 

Regarding claim 35, the combination of Russell and Yamamoto disclosed the method of 
claim 33. 

Russell further disclosed processing and producing the index table while sequentially 
reading the data blocks (col. 14, lines 43-67 and col. 11, lines 1-17). 
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3. Claims 19 and 34 are rejected under 35 U.S.C. 103(a) as obvious over Russell and 
Yamamoto, in view of Welch et al. (U.S. Patent No. 4,336,421, hereinafter "Welch"). 

Regarding claims 19 and 34, the combination of Russell and Yamamoto disclosed the 
method of claims 18 and 33. 

Russell further disclosed the start and end of an audio information data sequence are 
stored as start and end address (Russell, Fig. 5 and column 1 8). 

Russell did not explicitly disclose that the start address and end address are for a first 
address pointer and a second address pointer of the index table. The examiner interprets the first 
address pointer and the second address pointer as the pointer to the start and end of an audio 
information data in the memory, as it is unclear to the examiner what a first address pointer and a 
second address pointer of the index table entails. 

However, in the same field of endeavor, Welch disclosed storing the start and end of a 
speech segment as the start and end addresses in the memory (Welch, col. 13, lines 42-64). 

One of ordinary skill in the art would have been motivated to combine Russell and Welch 
because both disclosed detecting voice sounds and pauses in speech (Russell, "Summary of the 
Invention" and Welch, "Summary of the Invention"), and Welch supplemented Russell's 
teaching with implementation details relating to buffer management for the speech data. 

Therefore, it would have been obvious for one to combine Russell and Welch such that 
Russell's invention can be embodied using techniques taught by Welch. 
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4. Claims 21-23, 30, 36-38, 45, 48 and 51 are rejected under 35 U.S.C. 103(a) as obvious 
over Russell and Yamamoto, in view of Freudberg et al. (U.S. Patent No. 4,696,03 1 , hereinafter 
"Freudberg"). 

Regarding claims 21, 36, 48 and 51, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 35, 47 and 50. 

Russell further disclosed filtering out short spoken utterances that are not useful for the 
user (Russell, col. 16, lines 21-46). 

Russell did not explicitly disclose filtering a particular minimum value for the number of 
information blocks doc not exceed and a particular first time limit value the signal pause of the 
two adjacent signal pause data blocks exceeds. 

However, Freudberg disclosed filtering out short bursts of energy by combining an ON 
time of less than 200 msec with the OFF times of the two adjacent OFF intervals to form a single 
OFF interval, which is essentially the same as what's described in the claim (column 6, lines 32- 
49). 

One would have been motivated to combine Russell and Freudberg because both 
disclosed silence (i.e. pause) detection and speech signal segmentation (Russell, col. 17, lines 60- 
65; Freudberg, col. 3, lines 28-40, "signal-ON" and "signal-OFF"). 

Therefore, it would have been obvious for one to incorporate Freudberg 's method of 
filtering out short energy bursts using threshold values into Russell to filter out information block 
that are falsely detected as speech blocks so as to save system processing time and reduce error 
rate. 
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Regarding claims 22 and 37, the combination of Russell, Yamato and Freudberg 
disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the minimum value is 1 . 

Freudberg disclosed the minimum value for ON time is 200 msec (Freudberg, col. 6, 
lines 32-49). 

Examiner considers the difference in the minimum value as an implementation choice 
that may vary in different embodiments of the same inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as that 
provided in the rejection of claim 2 1 . 

Regarding claims 23 and 38, the combination of Russell, Yamamoto and Freudberg 
disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the first time limit value is 0.5 seconds. 

Freudberg disclosed using an OFF interval (Freudberg, col. 6, lines 50-55). 

Examiner considers the value of 0.5 seconds specified in the claim as an implementation 
choice of Freudberg's OFF interval, which may vary in different embodiments of the same 
inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as that 
provided in the rejection of claim 2 1 . 

Regarding claims 30 and 45, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 
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Russell did not explicitly disclose wherein a succession of information data blocks which 
is not separated by a signal pause data block whose signal pause duration data amount to a signal 
pause of more than 2 seconds is detected as an audio information data sequence. 

However, Freudberg disclosed a signal pause detection method using an ON time to 
determine the minimum duration of speech intervals (Freudberg, col. 6, lines 32-49). 

Examiner considers the time duration of 2 seconds recited in the present claim as an 
implementation choice Freudberg 's ON time, which may vary in different embodiments of the 
same inventive concept. 

5. Claims 24-25, 31, 39-40, 46, 49 and 52 are rejected under 35 U.S.C. 103(a) as obvious 
over Russell and Yamamoto, in view of Imai et al. (U.S. 2001/0010037, hereinafter "Imai"). 

Regarding claims 24, 39, 49 and 52, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 35, 48 and the apparatus of claim 50. 

Russell did not explicitly disclose while processing the data, overwriting the signal 
duration data of signal pause data blocks whose signal pause duration exceeds a particular 
second time limit value with signal duration data having particular nominal signal duration. 

However, Imai (2001/0010037) disclosed a speech rate conversion system that replaces a 
non-speech interval exceeding a constant continued time with a break of the constant continued 
time that is shorter than the actual non-speech interval (Imai, [0034]). 

One would have been motivated to combine Russell and Imai because both disclosed 
non-speech interval detection. 
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Therefore, it would have been obvious for one to add Imai's speech rate conversion to 
Russell to save storage space and adjust playback speed without losing audio information, 

Regarding claims 25 and 40, the combination of Russell, Yamamoto and Imai disclosed 
the method of claims 24 and 39. 

Imai further disclosed wherein the second time limit value is 10 seconds and the nominal 
signal duration is 2 seconds (Examiner considers the threshold values recited in the claim as an 
implementation choice of Imai's constant continued time disclosed in [0034] which may vary in 
different embodiments of the same inventive idea). 

The rationale for the motivation to combine Russell and Imai is the same as that provided 
in the rejection of claim 24. 

Regarding claims 31 and 46, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell did not explicitly disclose wherein, when receiving the analog audio signal, the 
playing speed of a data medium on which the analog audio signal is recorded can be set. 

However, Imai disclosed a method for converting speech rate at a preset scaling factor 
using speech interval detection (Imai, "Abstract"). 

The rationale for the motivation to combine Russell and Imai is the same as that provided 
in the rejection of claim 24. 
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6. Claims 28-29 and 43-44 are rejected under 35 U.S.C. 103(a) as obvious over Russell and 
Yamamoto, in view of Gan et al. (IEEE Publication "Implementation of Silence Compression 
Scheme For G.723.1 Speech Coder Using TI TMS320S75 DSP Chip", 1997, hereinafter "Gan"). 

Regarding claims 28 and 43, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell did not explicitly disclose wherein all the data blocks are of the same size and 
correspond to a particular basic unit of duration. 

However, Gan disclosed an implementation of silence compression for G.723.1 coder, 
wherein G.723. 1 coder has frame size of 30 ms and compressed frame size of 24 bytes at 6.3 
kb/s coding rate. 

One would have been motivated to combine Russell and Gan because both disclosed 
silence detection and codec. 

It would have been obvious for one to incorporate Gan's silence compression and 
G.723.1 into Russell as one of the possible embodiments of Russell's speech peripheral. 

Regarding claims 29 and 44, the combination of Russell, Yamamoto and Gan disclosed 
the method of claims 28 and 43. 

Russell did not explicitly disclose wherein the basic unit of duration is 30 ms. 

However, as already addressed above in the rejection of claim 28, Gan disclosed a silence 
compression implementation for G.723.1 codec, where the G.723.1. codec has basic frame 
duration of 30 ms. 
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The rationale for the motivation to combine Russell and Gan is the same as that provided 
in the rejection of claim 28. 

Conclusion 

THIS ACTION IS FINAL. Applicant is reminded of the extension of time policy as set forth in 
37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 
1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, 
will the statutory period for reply expire later than SIX MONTHS from the mailing date of this 
final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to SHIRLEY X. ZHANG whose telephone number is (571)270- 
5012. The examiner can normally be reached on Monday through Friday 8:00am - 5:30pm EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, William Vaughn can be reached on (571) 272-3922. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/S.X.Z./ Art Unit 2444 
12/3/2009 

/Paul H Kang/ 

Primary Examiner, Art Unit 2444 



